Leveraging Subjective Human Annotation for Clustering Historic Newspaper Articles
نویسندگان
چکیده
Haimonti Dutta, The Center for Computational Learning Systems William Chan, Department of Computer Science Deepak Shankargouda, Department of Computer Science Manoj Pooleery, The Center for Computational Learning Systems Axinia Radeva, The Center for Computational Learning Systems Kyle Rego, Department of Computer Science Boyi Xie, The Center for Computational Learning Systems Rebecca J. Passonneau, The Center for Computational Learning Systems Austin Lee, The Center for Computational Learning Systems Barbara Taranto, New York Public Library
منابع مشابه
Learning Parameters of the K-Means Algorithm From Subjective Human Annotation
The New York Public Library is participating in the Chronicling America initiative to develop an online searchable database of historically significant newspaper articles. Microfilm copies of the papers are scanned and high resolution OCR software is run on them. The text from the OCR provides a wealth of data and opinion for researchers and historians. However, the categorization of articles p...
متن کاملTextual Article Clustering in Newspaper Pages
In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experiment on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the pre...
متن کاملClustering in Newspaper Pages
In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experimentation on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as th...
متن کاملSearching the news Using a rich ontology with time-bound roles to search through annotated newspaper archives
A frequent motivation for annotating documents using ontologies is to allow more efficient search. For collections of newspaper articles, it is often difficult to find specific articles based on keywords or topics alone. This paper describes a system that uses a formalisation of the content of newspaper articles to answer complex queries. The data for this system is created using Relational Con...
متن کاملEvaluation Set for Slovak News Information Retrieval
This work proposes an information retrieval evaluation set for the Slovak language. A set of 80 queries written in the natural language is given together with the set of relevant documents. The document set contains 3980 newspaper articles sorted into 6 categories. Each document in the result set is manually annotated for relevancy with its corresponding query. The evaluation set is mostly comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1208.3530 شماره
صفحات -
تاریخ انتشار 2012